Dataset presentation

Author

Thelma Panaïotis

Instrument overview

ISIIS: In Situ Ichthyoplankton Imaging System

  • organisms in 250 µm - 10 cm

  • sampling rate > 100 L s⁻¹

  • shadowgraphy

  • line scan camera

  • towed + yos

ISIIS light scheme

ISIIS light scheme

ISIIS captures frames within a field of view of 10.5 × 10.5 × 50 cm (H × W × D). These frames are 2048 × 2048 px.

ISIIS frame, before rotation, flat fielding and equalisation.

ISIIS frame, before rotation, flat fielding and equalisation.

Because it uses a line-scan camera and it is towed at ~constant speed, frames can be assembled together to recreate a continuous ribbon. For processing purposes (flat-fielding, segmentation…), 5 consecutive frames are assembled together to create an “image” of size 2048 × 10240 px.

An ISIIS image, i.e. 5 consecutive frames, after rotation, flat fielding and equalisation.

An ISIIS image, i.e. 5 consecutive frames, after rotation, flat fielding and equalisation.

See Faillettaz et al. 2016 for more details.

Dataset overview

The dataset consists of two tables:

  • images: processed images with associated env (T°, sal, oxy…) and metadata (coordinates, depth)

  • plankton: identified planktonic organisms with image name, position within image and a set of features

First, let’s keep only images in which plankton is present.

images <- images %>% filter(img_name %in% unique(plankton$img_name))

# List taxa
taxa <- plankton %>% pull(taxon) %>% unique() %>% sort()

The dataset consists of 17,980,741 identified plankton organsims belonging to 29 taxonomic groups, distributed within 622,233. Each image is 2048 × 10,240 px, representing 52.5 × 10.5 cm.

Have a look at one image

Now let’s check what the data looks like for a given image.

# Choose an image
my_img <- images %>% filter(keep) %>% sample_n(1) %>% pull(img_name)

# Select objects within this image
df <- plankton %>% filter(img_name == my_img)

# Keep relevant information
df <- df %>% 
  select(transect, img_name, taxon, object_id, centroid_0, centroid_1) %>% 
  mutate(
    x = centroid_1 + 1, # need to add 1 because Python indexing starts at 0
    y = centroid_0 + 1, # need to add 1 because Python indexing starts at 0
    ) %>% 
  select(-contains("centroid"))

# Plot organisms position within image
df %>% 
  ggplot() +
  geom_point(aes(x = x, y = y, colour = taxon)) +
  scale_x_continuous(expand = c(0,0)) + scale_y_continuous(expand = c(0,0)) +
  coord_fixed() +
  theme_bw()

Now let’s take a step back.

“2013-07-28_02-58-43_974204” has many copepods, have a look at it!

Number of organisms per image

pl_per_img <- plankton %>% count(img_name)
summary(pl_per_img$n)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    2.0    11.0    20.0    28.9    38.0   547.0 
ggplot(pl_per_img) + 
  geom_density(aes(x = n)) +
  labs(x = "Organisms per image")

Let’s do a rank/frequency plot to better see the power law.

p <- ecdf(pl_per_img$n)

pl_per_img %>% 
  mutate(ecdf = p(n)) %>% 
  filter(n > 11) %>% 
  arrange(desc(n)) %>% 
  add_count(n, name = "count") %>% 
  mutate(cum_count = cumsum(as.numeric(count))) %>% 
  ggplot() +
  geom_path(aes(x = n, y = cum_count)) +
  scale_x_log10() + scale_y_log10() +
  labs(y = "Img with value > n")

Taxonomy

plankton %>% 
  count(taxon) %>% 
  arrange(n) %>% 
  mutate(taxon = fct_inorder(taxon)) %>% 
  ggplot() +
  geom_col(aes(x = n, y = taxon)) + 
  scale_x_log10()

Campaign overview

  • summer 2013: July 23rd to 29th

  • sampling between 0 - 100 m

  • 28 transects (see map below)

    • 7 along current

    • 7 cross current

    • 14 “lagrangian”

Map

camp <- images %>% 
  select(
    transect, img_name, 
    lon, lat, datetime, dist, depth,
    yo, yo_type, period,
    temp:dens
    )

camp %>% 
  group_by(transect, yo) %>% 
  slice(1) %>% 
  ungroup() %>% 
  mutate(transect_type = str_split_fixed(transect, "_", n = 2)[,1], .after = transect) %>% 
  ggplot(aes(x = lon, y = lat)) +
  geom_path(aes(group = transect, colour = transect_type)) +
  geom_polygon(data = coast, fill = "gray") +
  labs(x = "Longitude", y = "Latitude", colour = "Transect\ntype") +
  coord_quickmap(expand = FALSE)

Time

# Nice color palette
cols <- c(
    "dawn" = "#fc8d62", 
    "day" = "#80b1d3", 
    "dusk" = "#df65b0", 
    "night" = "#022660"
)

# Get beginning and end of each transect
transect_times <- camp %>% 
  group_by(transect) %>% 
  slice(c(1,n())) %>% 
  ungroup() %>% 
  select(transect, datetime, period)

transect_times %>% 
  ggplot() +
  geom_path(aes(x = datetime, y = transect, group = transect, colour = period), linewidth = 3) +
  scale_colour_manual(values = cols) +
  scale_x_datetime(
    date_breaks = "12 hours", 
    expand = c(0,0),
    labels = label_date_short(format = c("","", "%m/%d", "%H:%M"), sep = "\n"),
    limits = c(
      floor_date(min(transect_times$datetime), unit = "day"), 
      ceiling_date(max(transect_times$datetime), unit = "day")
    )
  ) +
  labs(x = "Datetime (CEST)", y = "Transect", color = "Period") 

Environmental data

Environmental conditions during one of the cross-current transects.

Environmental conditions during one of the cross-current transects.

Things to account for

Non biological

The dataset has a few drawbacks that need to be accounted for:

  • segmentation recall (~92%) ➝ run simulations?

  • classification precision and recall

  • non-homogeneous turbulence

  • varying conditions (env, day VS night…)

  • variations of towing speed, i.e. pixel size in x axis can vary, but there are round objects (solitary Collodaria) in the dataset, and their deformation can provide information to correct this bias.

  • a 3D volume is projected onto a 2D image, what information does this gave us regarding the true position of organisms? ➝ run simulations

Biological

  • motility: some organisms are motile, others are not

  • division

  • size

Steps of analyses

Null hypothesis

The distribution below is an exemple of what we could expect for distances in the case of randomly distributed objects (i.e. null hypothesis).

ggplot(data.frame(x = c(0, 10)), aes(x = x)) +
  stat_function(fun = dnorm, args = list(mean = 5, sd = 1), aes(colour = "Null")) +
  labs(x = "d", y = "p") +
  scale_colour_manual("", values = c("Null" = "black")) +
  theme_classic() +
  theme(axis.text = element_blank(), axis.ticks = element_blank())

Distances between all organisms

Null hypothesis + all distances.

ggplot(data.frame(x = c(0:10)), aes(x = x)) +
  stat_function(fun = dnorm, args = list(mean = 5, sd = 1), aes(colour = "Null")) +
  stat_function(fun = dgamma, args = list(shape = 5, rate = 1.9), aes(colour = "All")) +
  scale_colour_manual("", values = c("Null" = "black", "All" = "red")) +
  labs(x = "d", y = "p") +
  theme_classic() +
  theme(axis.text = element_blank(), axis.ticks = element_blank())

If distances between all organisms has a distribution that differs from the null hypothesis, then planktonic organisms are not randomly distributed.

Other way to test for overall randomness: draw quadrats of equal size in images, there should be the ~same number of organisms in each quadrat (for a given concentration, i.e. not necessarily across images).

Intraspecific

Let’s label our planktonic organisms i, j, k… Plot intraspecific distances for all groups (dii, djj…).

ggplot(data.frame(x = c(0:10)), aes(x = x)) +
  stat_function(fun = dnorm, args = list(mean = 5, sd = 1), aes(colour = "Null")) +
  stat_function(fun = dgamma, args = list(shape = 5, rate = 1.9), aes(colour = "All")) +
  stat_function(fun = dgamma, args = list(shape = 6, rate = 2), aes(colour = "d<sub>ii</sub>")) +
  scale_colour_manual("", values = c(
    "Null" = "black", 
    "All" = "red", 
    "d<sub>ii</sub>" = "dodgerblue"
  )) +
  labs(x = "d", y = "p") +
  theme_classic() +
  theme(axis.text = element_blank(), axis.ticks = element_blank(), legend.text = ggtext::element_markdown())

This should reveal the presence/absence of intraspecific interactions:

  • if dii = Null, then random distribution for group i, i.e. no interactions

  • if dii tends to be larger than Null distribution, then organisms of group i are more spaced out than at random. Better repartition of resources if resources are evenly distributed (this could be investigated within spaces of evenly distributed resources, e.g. chla). IFD?

  • if dii tends to be smaller than Null distribution, then organisms of group i are aggregated. Could reveal an aggregation of resources, or predation in case of canibalism.

Interspecific

For this, we use pairs of groups i,j and compute distances between all organisms of types i,j within images (dij, dik…).

ggplot(data.frame(x = c(0:10)), aes(x = x)) +
  stat_function(fun = dnorm, args = list(mean = 5, sd = 1), aes(colour = "Null")) +
  stat_function(fun = dgamma, args = list(shape = 5, rate = 1.9), aes(colour = "All")) +
  stat_function(fun = dgamma, args = list(shape = 6, rate = 2), aes(colour = "d<sub>ii</sub>")) +
  stat_function(fun = dgamma, args = list(shape = 7, rate = 2.5), aes(colour = "d<sub>ij</sub>")) +
  stat_function(fun = dgamma, args = list(shape = 3, rate = 1.5), aes(colour = "d<sub>ik</sub>")) +
  scale_colour_manual("", values = c(
    "Null" = "black", 
    "All" = "red", 
    "d<sub>ii</sub>" = "dodgerblue",
    "d<sub>ij</sub>" = "darkgreen",
    "d<sub>ik</sub>" = "lightgreen"
  )) +
  labs(x = "d", y = "p") +
  theme_classic() +
  theme(axis.text = element_blank(), axis.ticks = element_blank(), legend.text = ggtext::element_markdown())

This should reveal the presence/absence of interspecific interactions:

  • if dij = Null, then groups i and j seem not to interact together

  • if dij tends to be larger than Null, then groups i and j are avoiding each other (competition for resources, IFD in case of homogeneous resources)

  • if dij tends to be smaller than Null, then groups i and j are closer, which could reveal interactions such as competition for aggregated resources, predation of one on the other.

Prey VS  predator point of view

Let’s imagine i preys on j, then

  • in the pov of i, it makes sense to get closer to j, so that dij should be shifted to the left

  • in the pov of j, it makes sense to avoid i, so that dij should be shifted to the right

Which is correct?